Skip to content

feat: add AICB workload generator with Qwen3/Qwen3.5 training mocks#289

Open
yanzhenghao wants to merge 1 commit into
aliyun:masterfrom
yanzhenghao:pr-aicb-qwen3
Open

feat: add AICB workload generator with Qwen3/Qwen3.5 training mocks#289
yanzhenghao wants to merge 1 commit into
aliyun:masterfrom
yanzhenghao:pr-aicb-qwen3

Conversation

@yanzhenghao

@yanzhenghao yanzhenghao commented Jun 15, 2026

Copy link
Copy Markdown

Summary

Flattens the AICB submodule into the main repo and adds Qwen3/Qwen3.5 training workload mocks. One clean commit on top of origin/master.


MockedQwen3.py -- Qwen3 dense training workloads (461 lines)

Supports all 6 Qwen3 dense model sizes: 0.6B, 1.7B, 4B, 8B, 14B, 32B.

Architectural correctness vs LLaMA/Megatron:

  • GQA: separate num_key_value_heads for K/V projections (Megatron hardcodes MHA)
  • head_dim from config: uses explicit head_dim=128 instead of hidden_size // num_attention_heads. Correctly handles expansion models where head_dim * num_heads > hidden_size (Qwen3-0.6B: 2.00x, Qwen3-4B: 1.60x, Qwen3-32B: 1.60x)
  • QK-Norm: RMSNorm on query and key per-head after projection, hardcoded always-on in Qwen3 architecture. Compute-only -- zero communication impact. Confirmed from transformers source (modeling_qwen3.py lines 248-249)
  • SwiGLU sizing fix: down-projection input is intermediate_size, not 2 * intermediate_size (unlike MegatronMlp which overcounts params by 2x)
  • Qwen3Embedding: no Megatron artifacts (removed 4x vocab multiplier and learned position_embedding; Qwen3 uses RoPE)
  • tie_word_embeddings: lm_head weight zeroed when tie_word_embeddings=true (0.6B, 1.7B, 4B). Communication unchanged.
  • MoE: reuses MOEMLP from MockedMegatron (128 experts, top-8, no shared experts)

Reuses MegatronColumnLinear, MegatronRowLinear, MOEMLP from MockedMegatron.py -- zero duplication of TP communication primitives.


MockedQwen3_5.py -- Qwen3.5 dense/MoE training workloads (823 lines)

Supports Qwen3.5 dense (0.8B, 2B, 4B, 9B, 27B) and MoE (35B-A3B, 122B-A10B, 397B-A17B).

Hybrid architecture:

  • GatedDeltaNet linear attention on 75% of layers
  • Full attention on 25% of layers (3:1 interleaved pattern, full_attention_interval=4)
  • head_dim=256, partial_rotary_factor=0.25, attn_output_gate=true, MRoPE
  • MoE with shared experts (256-512 experts, top-8/10)

Bug fixes (also benefit Megatron and DeepSeek)

  • MoE backward pass: MOEMLP.backward() was missing workloads.extend() on the return values of self.permutation() and self.unpermutation(). This caused all MoE models (Megatron, Qwen3, Qwen3.5, DeepSeek) to report ~43-57% of correct backward communication. Fixed in MockedMegatron.py (2 lines).
  • EP message sizing: MoE TP all-gather/reduce-scatter message sizes now divide by ep_size, fixing a conservative overestimate. Applied to both MockedMegatron.py and MockedDeepSeek.py.
  • SyntaxWarning: raw string prefix for \i/\d escape sequences in aicb/utils/utils.py docstring.

Supporting changes

  • aicb/utils/utils.py: added Qwen3/Qwen3.5 to --frame choices, get_qwen3_params() with --head_dim and --num_key_value_heads CLI args
  • aicb/workload_generator/generate_megatron_workload.py: Qwen3/Qwen3.5 dispatch in __main__
  • aicb/workload_generator/CLAUDE.md: comprehensive architecture docs, verified configs, design patterns
  • aicb/tuning/: scaler, variability, wrapper (previously missing from AICB)

Tests: 73 total (58 new in test_mocked_qwen3.py), all green

Test Models What it verifies
AG/RS/A2A counts 6 dense + 2 MoE Matches per-layer formula exactly
Message sizes 6 dense + 2 MoE 2 x seq x batch x hidden for ColumnLinear, correct A2A sizing
QK-Norm 6 dense L q_norm + L k_norm params, each head_dim=128, 0 comm items
tie_word_embeddings 6 dense lm_head=0 when tied, vocab*hidden/TP otherwise
Embedding params 6 dense ratio=1.0x (no Megatron 4x multiplier)
Head expansion 6 dense Q_dim = num_heads * head_dim, KV_dim = num_kv * head_dim
A2A symmetry 2 MoE fwd_A2A == bwd_A2A
MoE backward 2 MoE backward not empty (regression test for fix)

Full test suite: 79 server tests + 73 aicb tests + TypeScript type check -- all green, zero new warnings.

@CLAassistant

CLAassistant commented Jun 15, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@yanzhenghao yanzhenghao force-pushed the pr-aicb-qwen3 branch 12 times, most recently from 43aa2d0 to 964bb94 Compare June 15, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants